On Discovering Concept Entities from Web Sites
نویسندگان
چکیده
A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.
منابع مشابه
An introduction to methods of discovering and identifying ancient sites with emphasis on evidence and geomorphologic techniques
Recognizing of position of ancient sites, it is of the great help to archaeologist. After this recognition, the archaeologist with rely on the knowledge and usual techniques in archaeology can determine the range of sites. After the discovery of this information, the archaeologist can get the information about the social, economic, livelihood and political of the past of sites. In this researc...
متن کاملDiscovery of Concept Entities from Web Sites using Web Unit Mining
A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into cat...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA Survey on Web Service Discovering and Composition
This paper reviews the existing techniques used in the discovering and composing of services. The task of selecting an adequate service can quickly grow tedious if all services that are listed under a certain description have to be compared manually for the final selection. And what is more, the final selection does not only depend on service parameters like executions costs or accuracy, but de...
متن کاملDiscovering Entity Knowledge Bases on the Web
Recognition and disambiguation of named entities in text is a knowledge-intensive task. Systems are typically bound by the resources and coverage of a single target knowledge base (KB). In place of a fixed knowledge base, we attempt to infer a set of endpoints which reliably disambiguate entity mentions on the web. We propose a method for discovering web KBs and our preliminary results suggest ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005